gh-150942: Speed up csv.reader row building#150995
Conversation
| } | ||
| if (PyList_Append(self->fields, field) < 0) { | ||
| Py_DECREF(field); | ||
| if (_PyList_AppendTakeRef((PyListObject *)self->fields, field) < 0) { |
There was a problem hiding this comment.
The list used here is self->fields. Can we guarantee that it is not mutated by concurrent threads?
There was a problem hiding this comment.
when parse_save_field is running, self->fields is only reachable through self, and self is locked by `Reader_iternext's critical section. i dont think any other thread can mutate the list.
static PyObject *
Reader_iternext(PyObject *op)
{
PyObject *result;
Py_BEGIN_CRITICAL_SECTION(op);
result = Reader_iternext_lock_held(op);
Py_END_CRITICAL_SECTION();
return result;
}There was a problem hiding this comment.
what do you think?
There was a problem hiding this comment.
The object is locked and self->fields is not accessible via other paths so this is indeed safe.
If redesigning I would make probably make fields (and maybe some other fields) a local variable inside Reader_iternext_lock_held (only minor thing to take care of: the fields currently acts as a guard for re-entrant calls). This is out of scope for the PR though.
I would drop the new entry (or at least omit the implementation details).
@omkar-334 This needs to be reviewed by a core dev, might it might take some time.
Use
_PyList_AppendTakeRefwhile collecting fields incsv.readerinstead ofPyList_Appendfollowed byPy_DECREF, removing an incref/decref pair perparsed field.
Microbenchmarks
Release build, macOS
csv_reader_2m_fieldscsv_reader_400k_fields_8colcsv_reader_80k_fieldsBenchmark script